12 research outputs found

    A Broad Evaluation of the Tor English Content Ecosystem

    Full text link
    Tor is among most well-known dark net in the world. It has noble uses, including as a platform for free speech and information dissemination under the guise of true anonymity, but may be culturally better known as a conduit for criminal activity and as a platform to market illicit goods and data. Past studies on the content of Tor support this notion, but were carried out by targeting popular domains likely to contain illicit content. A survey of past studies may thus not yield a complete evaluation of the content and use of Tor. This work addresses this gap by presenting a broad evaluation of the content of the English Tor ecosystem. We perform a comprehensive crawl of the Tor dark web and, through topic and network analysis, characterize the types of information and services hosted across a broad swath of Tor domains and their hyperlink relational structure. We recover nine domain types defined by the information or service they host and, among other findings, unveil how some types of domains intentionally silo themselves from the rest of Tor. We also present measurements that (regrettably) suggest how marketplaces of illegal drugs and services do emerge as the dominant type of Tor domain. Our study is the product of crawling over 1 million pages from 20,000 Tor seed addresses, yielding a collection of over 150,000 Tor pages. We make a dataset of the intend to make the domain structure publicly available as a dataset at https://github.com/wsu-wacs/TorEnglishContent.Comment: 11 page

    Some (Non-)universal features of Web robot traffic

    No full text
    Understanding the qualities of Web robot traffic is essential to build mechanisms for mitigating the impact of their traffic on Web systems. This paper presents an updated characterization of the navigational and session patterns of Web robot traffic across three Web servers in the United States, Europe, and the Middle East under 30 different features. The results indicate that some features may be fitted to the same heavy-tailed model across the Web servers, but the best fitting models for other features depend on the Web server. Due to some different tasks of Web robots and security policies set by website administrators, there are thus some features of Web robot traffic that cannot be universally modeled

    A First Look at References from the Dark to Surface Web World

    No full text
    Tor is one of the most well-known networks that protects the identity of both content providers and their clients against any tracking or tracing on the Internet. So far, most research attention has been focused on investigating the security and privacy concerns of Tor and characterizing the topic or hyperlink structure of its hidden services. However, there is still lack of knowledge about the information leakage attributed to the linking from Tor hidden services to the surface Web. This work addresses this gap by presenting a broad evaluation of the network of referencing from Tor to surface Web and investigates to what extent Tor hidden services are vulnerable against this type of information leakage. The analyses also consider how linking to surface websites can change the overall hyperlink structure of Tor hidden services. They also provide reports regarding the type of information and services provided by Tor domains. Results recover the dark-to-surface network as a single massive connected component where over 90% of Tor hidden services have at least one link to the surface world despite their interest in being isolated from surface Web tracking. We identify that Tor directories have closest proximity to all other Web resources and significantly contribute to both communication and information dissemination through the network which emphasizes on the main application of Tor as information provider to the public. Our study is the product of crawling near 2 million pages from 23,145 onion seed addresses, over a three-month period

    Fuzzy Rough Set Feature Selection to Enhance Phishing Attack Detection

    No full text
    Phishing as one of the most well-known cybercrime activities is a deception of online users to steal their personal or confidential information by impersonating a legitimate website. Several machine learning-based strategies have been proposed to detect phishing websites. These techniques are dependent on the features extracted from the website samples. However, few studies have actually considered efficient feature selection for detecting phishing attacks. In this work, we investigate an agreement on the definitive features which should be used in phishing detection. We apply Fuzzy Rough Set (FRS) theory as a tool to select most effective features from three benchmarked data sets. The selected features are fed into three often used classifiers for phishing detection. To evaluate the FRS feature selection in developing a generalizable phishing detection, the classifiers are trained by a separate out-of-sample data set of 14,000 website samples. The maximum F-measure gained by FRS feature selection is 95% using Random Forest classification. Also, there are 9 universal features selected by FRS over all the three data sets. The F-measure value using this universal feature set is approximately 93% which is a comparable result in contrast to the FRS performance. Since the universal feature set contains no features from third-part services, this finding implies that with no inquiry from external sources, we can gain a faster phishing detection which is also robust toward zero-day attacks

    A First Look at References from the Dark to Surface Web World

    No full text
    Tor is one of the most well-known networks that protects the identity of both content providers and their clients against any tracking or tracing on the Internet. So far, most research attention has been focused on investigating the security and privacy concerns of Tor and characterizing the topic or hyperlink structure of its hidden services. However, there is still lack of knowledge about the information leakage attributed to the linking from Tor hidden services to the surface Web. This work addresses this gap by presenting a broad evaluation of the network of referencing from Tor to surface Web and investigates to what extent Tor hidden services are vulnerable against this type of information leakage. The analyses also consider how linking to surface websites can change the overall hyperlink structure of Tor hidden services. They also provide reports regarding the type of information and services provided by Tor domains. Results recover the dark-to-surface network as a single massive connected component where over 90% of Tor hidden services have at least one link to the surface world despite their interest in being isolated from surface Web tracking. We identify that Tor directories have closest proximity to all other Web resources and significantly contribute to both communication and information dissemination through the network which emphasizes on the main application of Tor as information provider to the public. Our study is the product of crawling near 2 million pages from 23,145 onion seed addresses, over a three-month period

    Fuzzy Rough Set Feature Selection to Enhance Phishing Attack Detection

    No full text
    Phishing as one of the most well-known cybercrime activities is a deception of online users to steal their personal or confidential information by impersonating a legitimate website. Several machine learning-based strategies have been proposed to detect phishing websites. These techniques are dependent on the features extracted from the website samples. However, few studies have actually considered efficient feature selection for detecting phishing attacks. In this work, we investigate an agreement on the definitive features which should be used in phishing detection. We apply Fuzzy Rough Set (FRS) theory as a tool to select most effective features from three benchmarked data sets. The selected features are fed into three often used classifiers for phishing detection. To evaluate the FRS feature selection in developing a generalizable phishing detection, the classifiers are trained by a separate out-of-sample data set of 14,000 website samples. The maximum F-measure gained by FRS feature selection is 95% using Random Forest classification. Also, there are 9 universal features selected by FRS over all the three data sets. The F-measure value using this universal feature set is approximately 93% which is a comparable result in contrast to the FRS performance. Since the universal feature set contains no features from third-part services, this finding implies that with no inquiry from external sources, we can gain a faster phishing detection which is also robust toward zero-day attacks

    Interaction of Structure and Information on Tor

    No full text
    Tor is the most popular dark network in the world. It provides anonymous communications using unique application layer protocols and authorization schemes. Noble uses of Tor, including as a platform for censorship circumvention, free speech, and information dissemination make it an important socio-technical system. Past studies on Tor present exclusive investigation over its information or structure. However, activities in socio-technical systems, including Tor, need to be driven by considering both structure and information. This work attempts to address the present gap in our understanding of Tor by scrutinizing the interaction between structural identity of Tor domains and their type of information. We conduct a micro-level investigation on the neighborhood structure of Tor domains using struc2vec and classify the extracted structural identities by hierarchical clustering. Our findings reveal that the structural identity of Tor services can be categorized into eight distinct groups. One group belongs to only Dream market services where neighborhood structure is almost fully connected and thus, robust against node removal or targeted attack. Domains with different types of services form the other clusters based on if they have links to Dream market or to the domains with low/high out-degree centrality. Results indicate that the structural identity created by linking to services with significant out-degree centrality is the dominant structural identity for Tor services

    A Broad Evaluation of the Tor English Content Ecosystem

    No full text
    Tor is among most well-known dark net in the world. It has noble uses, including as a platform for free speech and information dissemination under the guise of true anonymity, but may be culturally better known as a conduit for criminal activity and as a platform to market illicit goods and data. Past studies on the content of Tor support this notion, but were carried out by targeting popular domains likely to contain illicit content. A survey of past studies may thus not yield a complete evaluation of the content and use of Tor. This work addresses this gap by presenting a broad evaluation of the content of the English Tor ecosystem. We perform a comprehensive crawl of the Tor dark web and, through topic and network analysis, characterize the types of information and services hosted across a broad swath of Tor domains and their hyperlink relational structure. We recover nine domain types defined by the information or service they host and, among other findings, unveil how some types of domains intentionally silo themselves from the rest of Tor. We also present measurements that (regrettably) suggest how marketplaces of illegal drugs and services do emerge as the dominant type of Tor domain. Our study is the product of crawling over 1 million pages from 20,000 Tor seed addresses, yielding a collection of over 150,000 Tor pages. We make a dataset of the intend to make the domain structure publicly available as a dataset at this https URL

    A Broad Evaluation of the Tor English Content Ecosystem

    No full text
    Tor is among most well-known dark net in the world. It has noble uses, including as a platform for free speech and information dissemination under the guise of true anonymity, but may be culturally better known as a conduit for criminal activity and as a platform to market illicit goods and data. Past studies on the content of Tor support this notion, but were carried out by targeting popular domains likely to contain illicit content. A survey of past studies may thus not yield a complete evaluation of the content and use of Tor. This work addresses this gap by presenting a broad evaluation of the content of the English Tor ecosystem. We perform a comprehensive crawl of the Tor dark web and, through topic and network analysis, characterize the types of information and services hosted across a broad swath of Tor domains and their hyperlink relational structure. We recover nine domain types defined by the information or service they host and, among other findings, unveil how some types of domains intentionally silo themselves from the rest of Tor. We also present measurements that (regrettably) suggest how marketplaces of illegal drugs and services do emerge as the dominant type of Tor domain. Our study is the product of crawling over 1 million pages from 20,000 Tor seed addresses, yielding a collection of over 150,000 Tor pages. We make a dataset of the intend to make the domain structure publicly available as a dataset at this https URL
    corecore